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are not measuring the same underlying trait. Limited evidence has suggested 
that certain individuals are more predisposed to providing differential 
response patterns when responding to a mixed item format scale. However, to 
date, only a few characteristics of these differentially responding 
individuals have been identified. This study sought to extend this research, 
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canonical correlation analysis, a sample of 158 students revealed a 
relationship between degree of differential response between positively and 
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Abstract 

Many instruments, especially Likert-type scales, contain both positively- and 
negatively-worded items within the same scale (i.e., mixed item format). A major reason 
for this practice appears to be to discourage response sets from emerging. Using this 
format also helps the analyst to detect response sets that occur in data sets, and thus 
eliminate them from subsequent analyses. However, some psychometricians seriously 
question the use of mixed item formats, positing that positively- and negatively-worded 
items within a scale are not measuring the same underlying trait. Limited evidence has 
suggested that certain individuals are more predisposed to providing differential 
response patterns when responding to a mixed item format scale. However, to date, 
only a few characteristics of these differential-responding individuals have been 
identified. Thus, the purpose of this present study was to extend this line of research. 
Specifically, the researchers analyzed responses to several scales utilizing mixed item 
formats. For example, using a canonical correlation analysis, a sample of 185 students 
revealed a relationship between degree of differential responses between positively- 
and negatively-worded items on three 6-item measures of foreign language anxiety 
(i.e., input, processing, and output anxiety) and several dimensions of self-perception, 
study skills, and locus of control. Implications of all findings are discussed. 
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Profiles of Respondents Who Respond Inconsistently to Positively- and 
Negatively-worded Items on Rating Scales 
Negatively-worded items, those phrased in the semantically opposite direction of 
the majority (Barnette, 2000), are often posed on surveys in an attempt to diminish non- 
attending behaviors such as acquiescence, satisficing, and response set. Cronbach 
(1946, 1950) termed the tendency of participants simply to agree with survey items 
acquiescence. Later, Couch and Keniston (1960) labeled the predisposition toward the 
direction of item wording regardless of content “yea- or nay-saying.” Krosnick, Narayan, 
and Smith (1996) attribute participants yea-saying to satsificing, or the proclivity to 
agree with an item due to the exertion of minimal cognitive effort. Irrespective of the 
cause, both behaviors may lead to participant responses that may not accurately 
communicate the trait or belief that the surveyor sought to measure. Measurement 
error on surveys also may result from, response set, or the tendency of participants to 
respond to general feelings about the survey topic rather than the specific item content. 
Therefore, negatively-worded items are often presented on surveys under the 
assumption that the items will encourage respondents to process items more carefully. 
This greater attention to item content should, therefore, reduce both response set and 
satisficing; while also balancing the impact of yea- or nay-saying. Nunnally (1978) 
proposed that a balance of positively- and negatively-worded items would virtually 
eliminate such behaviors. However, an increasing number of researchers are 
recommending that this practice should be undertaken with caution. 

Several studies have found that item orientation can potentially confound factor 
structure (Campbell & Grissom, 1979; Deemer & Minke, 1999; Eggers, 2000; Johnson 
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& Osborn, 2000) and may often result in a separate factor for the negatively-worded 
items (Anderson, Anderson, & Jenson, 1979; Ibrhim, 2001; Magazine, Williams, & 
Williams, 1996; Mclnerney, Mclnerney, & Roche, 1994; Motl, Conroy, & Horan, 2000). 
The presence of such a factor is disturbing in that it is “irrelevant to the trait being 
measured” (Ibrahim, 2001 , p. 498). However, the debate over the cause of the 
negative factor continues. To date, researchers have attributed the cause to “careless 
responses (Schmitt & Stults, 1985), insufficient cognitive ability (Cordery & Sevastos, 
1993), impaired response accuracy as a result of the negatively-worded items 
(Schriesheim, Eisenbach, & Hill, 1991; Schriesheim & Hill, 1981), and the actual 
measurement of a different construct (Pilotte & Gable, 1990)” (Magazine et al., 1996, p. 
247). 

Lustig (1963) investigated the relationship between response styles of yes- 
saying, nay-saying, and in-between-saying, and perceptual aspects of personality for a 
sample of 220 high school and college students. Lustig developed an instrument to 
categorize participant response styles and then compared the style groups on several 
personality traits. Findings indicated that, yea-sayers viewed the world as more positive 
and friendly, nea-sayers perceived the world as more negative and unfriendly, and in- 
between sayers saw the world as more ambivalent and uncertain. The problem of yea- 
saying was more common in that twice as many participants fell into that category. 

A respondent characteristic that has been explored in several studies is age. 
Melnick and Gable (1990) examined responses to positively- and negatively-worded 
items on the Parent Attitudes Toward School Effectiveness survey (Gable, Murphy, 

Hall, & Clark, 1986) completed by 3,328 parents. These researches found that age, 
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education level, interest in topic, and readability of the instrument all impacted 
responses to the mixed stem items. Eggers (2000) found that senior adults responded 
differently to the negatively-worded items, in that the items did not load on the 
anticipated factors; however, the data from the mid-life group fit the expected model. 
Benson and Hocevar (1985) detected an age effect in that middle school students were 
less likely to indicate agreement by disagreeing to a negatively-worded item. This 
finding, however, could be attributed to reading ability. Marsh (1984) found that pre- 
adolescent children often responded to negatively-worded items inappropriately and 
detected a relationship between the inappropriate responses and reading ability that 
was independent of age. Therefore, the effect could have resulted from either age, 
educational level, or a combination of the two. 

Of course, no discussion of responses to survey items would be complete 
without discussing score reliability and validity. Barnette (2000) found a substantial 
decrease in reliability for the negatively-worded items on an attitude toward year-round 
schooling survey administered to high school students. Johnson and Osborn (2000) 
found an increase in reliability on responses to a theoretical orientation scale when yea- 
sayers and nay-sayers were dropped from the analysis. Sandoval and Lambert (1978) 
found that adding positively-worded items to a teacher rating of a hyperactivity 
instrument composed of negatively-worded items increased both score reliability and 
validity. Somewhat in contrast, Schriesheim and Hill (1981) found that negatively- 
worded items increased validity but had no impact on the reliability of scores from 
undergraduates providing responses to a presented scenario. 

Wright and Master (1982) concluded that the differently worded items did not 
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provide consistent information. Congruent with this conclusion, using generalizability 
theory, Chang (1995) established that negatively-worded items are not fully equivalent 
to their positively-worded counterparts. Moreover, Johanson, Gips, and Rich (1993) 
found that the negatively-worded items may result in a lack of information (i.e, non- 
response). They concluded that participants may be following the social norm that if 
you do not have something nice to say then do not say anything at all; therefore, less 
favorable responses may be omitted more often. 

However, not all studies have found differences in responses to positively- and 
negatively-worded items. Marsh (1986) employed a construct-validity approach to data 
from three studies and found no reason to separate the items into positive and negative 
subscales. Similarly, Bergstrom and Lunz (1998) utilized item response theory and 
concluded that the positively- and negatively-worded items appeared to measure the 
same construct. On a dichotomous scale administered to middle school students, 
Williams, Bush, Par, Malone, and Jessup (2001) noted that responses to negatively- 
worded items were more strongly related to the criterion than were responses to 
positively-worded items. 

Therefore, discrepancies exist among the findings of studies exploring the impact 
of positively- and negatively-worded items. Primary causes for the incongruity include 
differences in survey content, methodology, participants, and criteria for analyzing the 
impact. The inconsistency could also be attributed to the comparison of responses to 
positively- and negatively-worded items that are not exact semantic opposites. Studies 
focusing on minimizing non-attending behaviors will undoubtedly result in a variety of 
findings due to the degree of nonattentiveness cultivated by differences in both content 
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and participants. One participant characteristics that consistently emerges, however, is 
age. It is yet to be determined, however, whether the age of the participant is 
associated with non-attending because the participants are careless in their responses 
or are confused by the negative wording. 

While age has been identified as a potential characteristic, what other 
characteristics can be identified? Weems, Onwuegbuzie, Eggers, and Schrieber (2001) 
found that participants with the greatest differential in responses between positively- 
and negatively-worded items on a measure of research anxiety tended to have negative 
self-perceptions about their academic competence, to have the highest levels of hope 
associated with pathways, not to have tendencies towards cooperative learning, and 
not to be self-oriented perfectionists. However, as noted by the researchers, much 
more work is needed in this area. Therefore, the purpose of this study is to replicate 
and to extend the study of Weems et al.’s study by investigating the characteristics of 
respondents who respond differently to positively- and negatively-worded items. The 
data for this study came from two samples gathered in larger research efforts. 

STUDY 1 

Participants 

The sample comprised 185 students enrolled in Spanish (63.93%), French 
(25.57%), German (7.76%), or Japanese (2.74%) introductory-level courses at a mid- 
southern university. The ages of the respondents ranged from 18 to 71 (M= 22.78, SD 
= 6.92), with 33.2% being male. The mean grade point average (gpa) was 3.05 (SD = 
0.59. Participation was voluntary. A series of Kruskal-Wallis one-way analyses of 
variance revealed no differences (£ > 05) across the courses with respect to the three 
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measures of foreign language anxiety, foreign language achievement, gpa, age, and 
measures of self-perception, study skills, and locus of control; thus, responses of all 
participants were combined. 

Instruments and Procedure 

A battery of instruments was used in the study, namely: the Input Anxiety Scale 
(IAS), the Processing Anxiety Scale (PAS), the Output Anxiety Scale (OAS), the Self- 
Perception Profile for College Students (SPPCS), the Academic Locus of Control Scale 
(ALG), the Study Habits Inventory (SHI), and the Background Demographic Form 
(BDF). Participants were given the questionnaire packet containing these instruments 
during the fourth week of the semester. They were instructed to complete the battery of 
instruments at home and to return it within two weeks. 

The three anxiety scales (i.e., the Input Anxiety Scale, the Processing Anxiety 
Scale, the Output Anxiety Scale) were developed by MacIntyre and Gardner (1994). 
Each scale contains six 5-point Likert-format items (i.e., 1 = strongly agree, 2 = agree, 3 
= neutral, 4 = disagree, 5 = strongly disagree) that assess how anxious students feel at 
the input, processing, and output stages of the foreign language learning process. For 
each scale, three items are positively worded and three items are negatively worded. 

All negative items were key-reversed before scoring, such that high scores on any of 
these scales represent high levels of anxiety at the corresponding stage. Sample items 
for the Input Anxiety Scale include, “I get flustered unless French/German/Spanish is 
spoken very slowly and deliberately” and “I get upset when I read in 
French/German/Spanish because I must read things again and again.” Sample items 
for the Processing Anxiety Scale include, “I am anxious with French/German/Spanish 
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because, no matter how hard I try, I have trouble understanding it” and “I feel anxious if 
French/German/Spanish class seems disorganized.” Finally, sample items for the 
Output Anxiety Scale include, “I may know the proper French/German/Spanish 
expression but when I am nervous it just won't come out” and “When I become anxious 
during a French/German/Spanish test, I cannot remember anything I studied.” For the 
present inquiry, the scores pertaining to the Input Anxiety Scale, the Processing Anxiety 
Scale, and the Output Anxiety Scale had classical theory alpha reliability coefficients of 
.70 (95% confidence interval [Cl] = .63, .76), .73 (95% Cl = .66, .79), and .76 (95% Cl = 
.70, .81), respectively. 

The SPPCS (Neemann & Harter, 1986) is a 54-item scale consisting of 13 
subscales (i.e., perceived creativity, perceived intellectual ability, perceived scholastic 
competence, perceived job competence, perceived athletic competence, perceived 
appearance, perceived romantic relationships, perceived social acceptance, perceived 
close friendships, perceived parent relationships, perceived humor, perceived morality, 
and perceived global self-worth). In order to ensure model parsimony, only the 
perceived intellectual ability, perceived scholastic competence, perceived social 
acceptance, and perceived global self-worth subscales of the SPPCS were analyzed in 
this study. For the present study, the classical theory alpha reliability coefficients 
pertaining to the selected subscales were as follows: perceived intellectual ability (.79; 
95% Cl = .74, .84), perceived scholastic competence (.71; 95% Cl = .64, .77), 
perceived social acceptance (.83; 95% Cl = .79, .87), and perceived global self-worth 
(.87; 95% Cl = .84, .90). 

The ALC, developed by Trice (1985), has 28 true-false items related to personal 
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control over academic outcomes. Scores range from 1 (strongly internal locus) to 28 
(strongly external locus). For the current investigation, the classical theory alpha 
reliability coefficient for the ALC was .73 (95% Cl = .67, .78). 

The SHI, developed by Jones and Slate (1992), consists of 63 true-false items 
designed to assess the typical study behaviors of college students. Thirty items 
describe effective study behaviors, and 33 items specify ineffective study behaviors. 
The latter items are key-reversed such that total scale scores range from 0 to 63, with 
high scores indicating good study skills. For the present research, the classical theory 
alpha reliability coefficient for the SHI was .88 (95% Cl = .85, .90). 

The BDF, developed specifically for this study, extracted relevant information 
such as age, gender, and students’ expectations for their overall average in their 
current language course. This latter variable was measured on a 100-point scale. 

Finally, foreign language achievement was measured using students' course 
averages. The course averages were measured on a 100-point scale. This global 
measure was selected instead of an isolated measure of specific skills in order to 
maximize the external validity (i.e., generalizability) of the findings. In order to adjust for 
differences in teacher characteristics (e.g., effectiveness, experience, motivation, and 
testing and scoring standards), standardized course averages were used instead of raw 
averages. Standardized course averages (i.e., z-scores) were computed for each 
student by subtracting the average achievement score of the foreign language class to 
which the student belonged from the student's course average, and then dividing by the 
class standard deviation. 

Analysis 
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A series of dependent t-tests was used to compare scores from the positively- 
worded items and scores from the negatively-worded items for each of the three anxiety 
scales (i.e., IAS, PAS, OAS). Bonferroni’s adjustment was used to maintain an overall 
5% level of significance. For the major analysis, a canonical correlation analysis was 
used to determine the characteristics of students who had the greatest differential in 
responses between the positively- and negatively-worded items on the three measures 
of anxiety. Canonical correlation analysis is a technique used to assess the relationship 
between two sets of variables when each set contains at least two variables (Cliff & 
Krus, 1976; Darlington, Weinberg, & Walberg, 1973; Thompson, 1980, 1984). The 
absolute difference between scores on the positively-worded items (n = 6) and the 
negatively-worded items (n = 6) for the IAS, PAS, and OAS served as the three 
dependent variables (i.e., dependent multivariate set of variables), whereas the 
independent set of variables consisted of gender, age, grade point average, whether 
the foreign language was required, students’ expectations for their overall average in 
their current language course, number of high school foreign language courses taken, 
number of university foreign language courses taken, overall course achievement in the 
foreign language course, study habits, locus of control, perceived intellectual ability, 
perceived scholastic competence, perceived social acceptance, and perceived global 
self-worth. The number of canonical functions (i.e., factors) that can be generated for a 
given dataset is equal to the number of variables in the smaller of the two variable sets. 
Because three anxiety measures were involved, three canonical functions were 
generated. 

Results 
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The series of dependent t-tests revealed that for the IAS, scores from the 
positively-worded items (M = 3.1 1 , SD = 0.68) were statistically significantly (t = 3.72, £ 
< .001) higher than scores from the negatively-worded items (M = 2.92, SD = 0.83). The 
effect size associated with this difference, as measured by Cohen’s (1988) d, was 0.25, 
which could be considered as being indicative of a small effect. Conversely, for the 
PAS, scores from the negatively-worded items (M = 3.04, SD = 0.75) were statistically 
significantly (t = 5.12, £ < .0001) higher than scores from the positively-worded items 
(M = 2.76, SD = 0.81). The associated effect size of .36 was moderate (Cohen, 1988). 
Finally, for the OAS, no statistically significant difference (t = 0.39, £ > .05) emerged 
between scores from the positively-worded items (M = 3.18, SD = 0.76) and scores 
from the negatively-worded items (M = 3.16, SD = 0.82). 

A series of score reliability coefficients was computed on the positively- and 
negatively-worded items pertaining to the three anxiety scales. For each scale, the 
score reliability indices for the positively-worded items were very different than those for 
the negatively-worded items. Specifically, for the IAS, the classical theory alpha 
reliability coefficient was .31 (95% Cl = .12, .47) for the positively-worded items and .74 
(95% Cl = .67, .80) for the negatively-worded items. For the PAS, the score reliability 
coefficient was .71 (95% Cl = .63, .78) for the positively-worded items and .55 (95% Cl 
= .42, .65) for the negatively-worded items. Finally, for the OAS, the score reliability 
coefficient was .59 (95% Cl = .48, .68) for the positively-worded items and .70 (95% Cl 
= .62, .77) for the negatively-worded items. 

The strength of the relationship between the two sets of variables was assessed 
by examining the magnitude of the canonical correlation coefficients. These 
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coefficients indicate the degree of relationship between the weighted anxiety variables 
and the weighted set of independent variables. In addition, the significance of the 
canonical roots was tested via the F-statistic based on Rao's approximation (Rao, 

1952). (The full correlation matrix that generated the canonical correlation analysis, 
although not presented due to space constraints, can be obtained from the authors.) 

The canonical analysis revealed that all three canonical correlations combined 
were statistically significant (F [142, 499.13] = 1 .42, e < .05). Further, when the first 
canonical root was excluded, the remaining two canonical roots were not statistically 
significant. Similarly, when the first two canonical roots were excluded, the remaining 
canonical root was not statistically significant. Together, these results suggested that 
only the first (R c1 = .47) canonical function was statistically significant, and represented 
a moderate effect size, contributing 21 .7% (i.e., R c1 2 ) to the shared variance. Both the 
second (R c1 2 = 5.1%) and third (R c1 2 = 3.7%) canonical functions did not contribute 
much to the shared variance. Thus, only the first canonical correlation was interpreted. 

Data pertaining to the first canonical root are presented in Table 1 . This table 
provides both the standardized function coefficient and the structure coefficient 
pertaining to the first canonical correlation. An examination of the standardized 
canonical function coefficients revealed that, using a cutoff correlation of 0.3 
recommended by Lambert and Durand (1975) as an acceptable minimum loading 
value, positive-negative differentials for both the IAS and PAS made important 
contributions to the anxiety composite, with the differential corresponding to the PAS 
making the greatest contribution. Interestingly, the IAS and PAS differentials were 
inversely related-consistent with the dependent t-test results. With respect to the 
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independent variables set, locus of control and students’ expectations for their overall 
average in their current language course made important contributions to the composite 
set. 

Similarly, the structure coefficients pertaining to the first canonical correlation 
revealed that positive-negative differentials for both the IAS and PAS made important 
contributions to the model, again in an inverse manner. Also, the following seven 
independent variables made important contributions: students’ expectations for their 
overall average in their current language course, study habits, locus of control, 
perceived intellectual ability, perceived scholastic competence, perceived social 
acceptance, and perceived global self-worth. 

Interestingly, from Table 1 , it can be seen from the square of the structural 
coefficients that locus of control was the best predictor of positive-negative differentials, 
followed by perceived global self-worth, study habits, perceived social acceptance, 
perceived intellectual ability, perceived scholastic competence, and students’ 
expectations for their overall average in their current language course, respectively. 



Insert Table 1 about here 



STUDY 2 

Participants 

The archival data set used for the second study represented a convenience 
sample of 86 participants recruited from a client database of a college disability 
services center. Participation was voluntary, anonymous, and informed consent was 
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obtained. The ages of the participants ranged from 18 to 51 (M = 27.10, SD = 10.20), 
with 82% of the enrolled in undergraduate programs and 18% enrolled in graduate 
programs. The majority of the sample was female (58.8%), Caucasian (93%), and had 
never been married (80%). Cumulative undergraduate GPA of participants was 
reported as 8.33% below 2.0, 51 .19% from 2.0 to 2.9, and 40.48% 3.0 or higher. 
Instrument 

The data were collected for the purpose of examining the relationship between 
an individual’s sense of coherence and dysfunctional career thoughts. The Sense of 
Coherence Scale (SOCS; Antonovsky, 1987) is a 29 item scale measuring sense of 
coherence with scales addressing comprehensibility, manageability (MA), and 
meaningfulness (ME). The SOCS utilizes seven-point bipolar adjective scales with 
unique adjectives provided on for virtually all items. The comprehensibility scale was 
not utilized in this study because only two items were negatively-worded. Sample items 
from the MA scale include “Has it happened that people whom you counted on 
disappoint you? (Adjectives: never happened and always happened)” and “ What best 
describes how you see life? (Adjectives: one can always find a solution to painful things 
in life and there is no solution to painful things in life).” Sample items from the ME scale 
include “Life is: (Adjectives: full of interest and completely routine)” and “ When you 
think about life, you very often: (Adjectives: feel how good it is to be alive and ask 
yourself why you exist at all).” For the present study, responses for scales were 
averaged with higher scores indicating a stronger sense of coherence. Several 
researchers have examined the criterion validity of scores obtained by the SOCS. 
Antonovsky (1993) reviewed 42 studies and found positive correlations between SOCS 
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and (a) health and well being, (b) self esteem, (c) social skills, and (d) social support; 
and negative correlations with anxiety and perceived stressors. For the present study, 
the classical theory alpha reliability coefficient for the MA scale was .83 (95% Cl = .77, 
.88) and for the ME scale was .85 (95% Cl = .79, .89). 

Procedures 

To identify characteristics of participants who responded differently to negatively- 
worded and positively-worded items, separate scale scores were calculated by item 
wording for both the MA and ME indices. The absolute differences between the scores 
were then used as the dependent variable in the regression analysis. Because of the 
mixed results concerning item orientation, scale means and reliabilities were compared 
before regressing the difference scores. 

Results 

Reliabilities for the positively- and negatively-worded items within the two scales 
were not compared for this study due to the asymmetric division and the small number 
of items once the scales were divided and the impact of scale length on scale reliability. 
After deleting one case for an out-of-range value, scores from the positively-worded MA 
scale (M = 4.55, SD = 1.15) and the negatively-worded MA scale (M = 5.02, SD = 1.14) 
were compared with a dependent t-test. Results indicated that the scores from the two 
scales differed significantly (t(84) = 5.128, £ < .001), in favor of the positively-worded 
scale with a moderate effect size (d = 0.55). A dependent t-test also was used to 
compare scores from the positively-worded ME scale (M = 5.25, SD = 1 .27) and the 
negatively-worded ME scale (M = 5.32, SD = 1 .04). No statistically significant 
differences emerged between responses to the positively- and negatively-worded items 
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on the ME scale (t(84) = 0.771 , £ < .443). 

A standard multiple regression was performed, with the absolute value of the 
difference scores on the MA scale serving as the dependent variable and gender, age, 
marital status (never married vs. other), and GPA (below 2.0, 2.0 to 2.9, and 3.0 or 
higher) entered as independent variables. The assumptions for regression seemed 
reasonable. Using a £ < .001 criterion for Mahalanobis distance revealed no outliers 
(Maximum = 17.72); and using Cook’s d did not highlight any influential observations 
(Maximum = 0.13). The normal probability plot of the standardized residuals suggested 
normality, and multicollinearity did not appear to be a problem; additionally the 
maximum variance inflation factor was 2.35. Missing data forced the omission of 5 
cases resulting in a final sample size of jn = 80. The multiple correlation coefficient, R, 
however, was not statistically significantly different from zero, F(4, 75) = 0.69, £ > 0.05. 

A standard multiple regression also was performed, with the absolute value of 
the difference scores on the ME scale serving as the dependent variable and gender, 
age, marital status (never married vs. other), and GPA (below 2.0, 2.0 to 2.9, and 3.0 or 
higher) entered as independent variables. The assumptions for regression seemed 
reasonable. Using a u < .001 criterion for Mahalanobis distance, no outliers were 
detected (Maximum = 17.72); and using Cook’s d indicated no influential observations 
(Maximum = 0.09). The normal probability plot of the standardized residuals suggested 
normality and multicollinearity did not appear to be a problem; additionally the maximum 
variance inflation factor was 2.35. Missing data forced the omission of 5 cases 
resulting in a final sample size of 80. Again, the multiple correlation coefficient, B, again 
was not statistically significantly different from zero, F(4, 75) = 1 .16, £ > 0.05. 
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DISCUSSION 

Many instruments, especially Likert-type scales, contain both positively- and 
negatively-worded items within the same scale (i.e., mixed item format). A major reason 
for this practice appears to be to discourage response sets from emerging. Using this 
format also helps the analyst to detect response sets that occur in data sets, and thus 
eliminate them from subsequent analyses. However, some psychometricians (e.g., 
Barnette, 2000) seriously question the use of mixed item formats, positing that 
positively- and negatively-worded items within a scale are not measuring the same 
underlying trait. 

Recently, evidence has suggested that certain individuals are more predisposed 
to providing differential response patterns when responding to a mixed item format 
scale (Weems et al., 2001). However, to date, only a few characteristics of these 
differential-responding individuals have been identified. Thus, the purpose of this 
present study was to extend this line of research. Specifically, two studies were 
conducted in which responses to several scales that utilize mixed item formats were 
analyzed. 

Participants in Study 1 responded differently to each of the three measures of 
foreign language anxiety. Whereas responses to the positively- and negatively-worded 
items of the Output Anxiety Scale were very consistent, this was not the case for either 
the Input Anxiety Scale or the Output Anxiety Scale. Moreover, these two latter scales 
evoked opposite sets of response patterns. Specifically, for the IAS scale, scores 
pertaining to the positively-worded items were moderately higher than were scores from 
the negatively-worded items, whereas the converse was true for the PAS. These 
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findings combined suggest that on two of the three anxiety scales, the positively- and 
negatively-worded items possibly measured different constructs. 

Interestingly, certain characteristics of the participants made them more likely to 
generate differential patterns of responses to the positively- and negatively-worded 
items. In particular, undergraduate students who had the highest expectations for their 
overall average in their current language course, the most effective study habits, the 
most internal locus of control, and the highest levels of perceived intellectual ability, 
perceived scholastic competence, perceived social acceptance, and perceived global 
self-worth tended to have greatest differential in responses between positively- and 
negatively-worded items on the Input Anxiety Scale and the least differential on the 
Processing Anxiety Scale. Simply put, students with the most positive orientation with 
respect to course expectations, study habits, locus of control, and self-perceptions 
tended to provide more extreme responses to the positive items on the Input Anxiety 
Scale and to provide more extreme responses to the negative items on the Processing 
Anxiety Scale. The reverse was true for students with the least positive orientations. 

Weems et al. (2001) found that graduate students with the greatest differential in 
responses between positively- and negatively-worded items tended to have certain 
traits. Thus, the results in Study 1 are consistent with this finding. Both of these studies 
suggest that strongly disagreeing to a positively-worded item is not equivalent to 
strongly agreeing to a negatively-worded item. Thus, using a mixed item formats may 
represent an important threat to the content- and construct-related validity of a scale, 
especially for certain individuals. In fact, because the positively- and negatively-worded 
items on two of the scales in Study 1 induced very different responses, it appears that 



Profiles of Respondents 20 



use of multidimensional scales that contain mixed item formats within the same study 
yield even more complex differential response patterns. 

However, the most disturbing aspect of the data in Study 1 was the fact that 
across the three scales, the reliability indices pertaining to scores on the positively- and 
negatively-worded items were very different. Whereas for the PAS, the score reliability 
coefficient was 29.1% higher for the positively-worded items, for the IAS and OAS, the 
score reliability coefficient was 138.7% and 18.. 6% higher for the negatively-worded 
items, respectively. Thus, although the IAS, PAS, and OAS scales all yielded scores 
with adequate reliability coefficients (i.e., > .70; Nunnally & Bernstein, 1994), it is likely 
that these indices were attenuated by the use of mixed item formats. That is, for the 
PAS, use of positively-worded items severely reduced score reliability. Conversely, for 
both the IAS and OAS, use of negatively-worded items severely reduced score 
reliability. 

Using the Spearman-Brown prophecy (Crocker & Algina, 1986) suggests that if 
the three positively-worded items on the PAS had been replaced by three parallel 
negatively-worded items, the score reliability for the total scale would increase from .70 
to .82-which represents a 17.1% increase. Similarly, if the three negatively-worded 
items on the IAS had been replaced by three parallel positively-worded items, the score 
reliability for the total scale would increase from .73 to .84 (a 15.1% increase). Also, if 
the three negatively-worded items on the OAS had been replaced by three parallel 
positively-worded items, the score reliability for the total scale would increase from .76 
to .86 (a 13.2% increase). This is extremely compelling evidence against the use of 
mixed item formats. 

o 
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The results from Study 2 did not uncover similar findings. The only similarity was 
a significant difference between responses to the positively- and negatively-worded 
items on the MA scale. Multiple regression on neither the MA or ME scale uncovered 
respondent characteristics that were associated with different response patterns. Study 
2, however, was more limited by a minimum of participant characteristics available in 
the archival data set and the small sample size greatly attenuated the statistical power 
for the regression analysis. 

However, a key distinction of the second study was the type of response options 
offered. Rather than responding to a series of Likert-type items, participants responded 
using bipolar adjectives. The adjectives offered changed with virtually every item, and 
the orientation of the items also were randomly altered between positively- and 
negatively-worded items. Thus, the participants may have been responding more 
carefully due to a potential novelty effect and, therefore, different response patterns to 
reverse-coded items due to carelessness and/or response set were virtually eliminated. 
The adjectives provided in this study also differed from the Likert-type scale in that 
participants were not asked to “disagree.” Therefore, the response options did not 
facilitate either “yea-saying” or “nay-saying” by participants. 

CONCLUSION 

The first study yielded extremely compelling evidence against the use of mixed 
item format. In this investigation, responses to positively- and negatively-worded items 
were significantly different. Indeed, in the case of the IAS, the response options 
appeared to facilitate “nay-saying,” whereas with respect to the PAS, “yea-saying” 
seemed to prevail among the participants. Further, the extent of the response 
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differential was a function of various individual characteristics. Moreover, across all 
three anxiety scales, the item format that yielded the highest mean scores also yielded 
the significantly lowest score reliability coefficient. These results suggest that writing an 
item that is inappropriately worded (i.e., either positively- or negatively-worded) induces 
artificially extreme responses, which, in turn, attenuates the score reliability of the scale. 
As was the case in the study of Weems et al. (2001), the first set of results suggest 
strongly that use of positively- and negatively-worded items within the same scale may 
seriously threaten both score reliability and score validity. As admonished by Weems et 
al., use of mixed item formats should be undertaken with extreme caution. 

In the second study, the significant difference between the positively- and 
negatively-worded items on the MA scale lends further support to suggest that the item 
wordings possibly measured different constructs. The second study also suggests that 
more research is needed to ascertain whether responses to Likert-formated scales are 
more adversely affected by mixed item formats than are other response options such 
as bipolar adjectives or if the lack of findings in this study were the result of low 
statistical power. In light of the findings from the present study, and similar findings in 
the literature reviewed, a replication of Lustig’s (1963) study to explore further the 
relationship between differential response patterns and personality characteristics by 
utilizing his instrument designed to identify acquiescence. 

Together, the two studies suggest that either (a) respondents process positively- 
worded items differently than negatively-worded items or (b) respondents do not read 
the negatively worded items as carefully as they do positively-worded items, and simply 
agree to the negatively-worded items instead of disagreeing due to a response set. 
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Second, several characteristics were identified pertaining to those who tend to have the 
largest absolute discrepancies in responses between the two sets of items. 

Indeed, bearing in mind that such scale formats can attenuate reliability 
estimates (Weems & Onwuegbuzie, 2001), when such formats are used, the resultant 
score reliability should be scrutinized. Further, researchers should refrain from 
comparing subscales that have different item wording formats. In particular, it is likely 
that comparing a subscale with positively- and negatively-worded items to another 
subscale whose wording is in the same direction will yield misleading results arising 
from different response patterns that have little to do with the actual constructs of 
interest, but, rather reflect acquiescing. 
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Table 1 



Canonical Solution for First Function 



Variable 


Standardized 

Coefficient 


Structure 

Coefficient 


Structure 

Coefficient 2 

(%) 


Criterion Set: 








Input Anxiety Scale 


0.63 


0.62 


38.44 


Processing Anxiety Scale 


-0.81 


-0.70 


49.00 


Output Anxiety Scale 


0.27 


0.14 


1.96 


Predictor Set: 








gender 


-0.03 


-0.05 


0.25 


age 


-0.16 


-0.10 


1.00 


grade point average 


-0.24 


-0.10 


1.00 


foreign language required 


0.13 


0.16 


2.56 


students’ expectations 


0.45' 


0.47' 


22.09 


number of high school language courses 


0.15 


0.12 


1.44 


number of university language courses 


-0.19 


-0.28 


7.84 


overall language course achievement 


-0.26 


-0.03 


0.09 


study habits 


0.19 


0.56' 


31.36 


locus of control 


-0.57' 


-0.73' 


53.29 


perceived intellectual ability 


0.10 


0.52' 


27.04 


perceived scholastic competence 


-0.01 


0.51' 


26.01 


perceived social acceptance 


0.02 


0.53' 


28.09 


perceived global self-worth 


0.12 


0.5/ 


32.49 



" loadings with large effect sizes (Lambert & Durand, 1975) 
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